Efficient Exploration and Value Function Generalization in Deterministic Systems
نویسندگان
چکیده
We consider the problem of reinforcement learning over episodes of a finitehorizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function Q⇤ lies within the hypothesis class Q, OCP selects optimal actions over all but at most dimE[Q] episodes, where dimE denotes the eluder dimension. We establish further efficiency and asymptotic performance guarantees that apply even if Q⇤ does not lie in Q, for the special case where Q is the span of pre-specified indicator functions over disjoint sets.
منابع مشابه
Comparing Geostatistical Seismic Inversion Based on Spectral Simulation with Deterministic Inversion: A Case Study
Seismic inversion is a method that extracts acoustic impedance data from the seismic traces. Source wavelets are band-limited, and thus seismic traces do not contain low and high frequency information. Therefore, there is a serious problem when the deterministic seismic inversion is applied to real data and the result of deterministic inversion is smooth. Low frequency component is obtained fro...
متن کاملGeneralization and Exploration via Randomized Value Functions
We propose randomized least-squares value iteration (RLSVI) – a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or -greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains...
متن کاملA Novel Combinatorial Approach to Discrete Fracture Network Modeling in Heterogeneous Media
Fractured reservoirs contain about 85 and 90 percent of oil and gas resources respectively in Iran. A comprehensive study and investigation of fractures as the main factor affecting fluid flow or perhaps barrier seems necessary for reservoir development studies. High degrees of heterogeneity and sparseness of data have incapacitated conventional deterministic methods in fracture network modelin...
متن کاملGaussian Processes for Sample Efficient Reinforcement Learning with RMAX-Like Exploration
We present an implementation of model-based online reinforcement learning (RL) for continuous domains with deterministic transitions that is specifically designed to achieve low sample complexity. To achieve low sample complexity, since the environment is unknown, an agent must intelligently balance exploration and exploitation, and must be able to rapidly generalize from observations. While in...
متن کاملAdaptive-Resolution Reinforcement Learning with Efficient Exploration in Deterministic Domains∗
We propose a model-based learning algorithm, the Adaptive-resolution Reinforcement Learning (ARL) algorithm, that aims to solve the online, continuous state space reinforcement learning problem in a deterministic domain. Our goal is to combine adaptive-resolution approximation scheme with efficient exploration in order to obtain fast (polynomial) learning rates. The proposed algorithm uses an a...
متن کامل